Degree-Based Graph Entropy in Structure–Property Modeling

Graph entropy plays an essential role in interpreting the structural information and complexity measure of a network. Let G be a graph of order n. Suppose dG(vi) is degree of the vertex vi for each i=1,2,…,n. Now, the k-th degree-based graph entropy for G is defined as Id,k(G)=−∑i=1ndG(vi)k∑j=1ndG(vj)klogdG(vi)k∑j=1ndG(vj)k, where k is real number. The first-degree-based entropy is generated for k=1, which has been well nurtured in last few years. As ∑j=1ndG(vj)k yields the well-known graph invariant first Zagreb index, the Id,k for k=2 is worthy of investigation. We call this graph entropy as the second-degree-based entropy. The present work aims to investigate the role of Id,2 in structure property modeling of molecules.

where k is real number. The first-degree-based entropy is generated for k = 1, which has been well nurtured in last few years. As n ∑ j=1 d G (v j ) k yields the

Introduction
Graph theory has developed into a powerful mathematical tool in a wide range of disciplines, including operational research, chemistry, genetics, and linguistics, as well as electrical engineering, geography, sociology, and architecture. In addition, it has grown into a useful field of mathematics on its own. Using a diagram composed of a collection of points with lines connecting specific pairs of these points, many real-world situations can be simply explained. Chemists work with graphs on a daily basis because almost all chemistry interactions are carried out through the graphic representation of compounds and reactions. Chemical graph theory appears to be the natural language of chemistry through which chemists communicate. One of the important tools in this area is the graph invariant, which is any property of molecular graph that remains unchanged under graph isomorphism. Numerous kinds of graph invariants have appeared in the literature based on different graph parameters. One such important parameter is the degree of vertex, which is defined as the number of incident edges. For a molecular graph, it represents the valency of the corresponding atom. For degree-based invariants, readers are referred to the article [1] and the references cited therein. The present work deals with degree-based graph entropy. Shannon et al. [2] put forward the concept of entropy in 1949, and it is now one of the most significant measures in information theory as an indicator of the randomness of information content. This idea was imposed on graphs in 1955 [3], employing certain probability distributions associated with the automorphisms of graphs. Graph entropies vary depending on the probability distributions set on the graph. Dehmer's graph entropies [4] based on information functionals are one of the highlights and have led to many significant research insights in the fields of information science, graph theory, and network science. Entropy corresponding to the independent sets and the matching of graphs is investigated in [5]. Bounds of such entropy measures are illustrated in 2020 [6]. The distance between two vertices is employed to design a new type of entropy in [7] whose upper bounds are set up by Ilić and Dehmer [8]. Cao et al. [9] investigated numerous attributes of entropy measure formulated on information functional by considering degree powers of graphs. To obtain insight about the quantity degree power, readers are referred to [1,10,11]. For further knowledge on graph entropy, see survey work [12].
For probability distribution α = (α 1 , α 2 , . . . , α n ), the entropy I(α) due to Shannon is defined as Let G be a finite, undirected and connected graph with vertex set , so that the entropy of G based on φ is formulated as Now, different entropy can be generated by varying φ. Cao et al. [9] for the degree of vertex v i , and proposed the k-th degreebased graph entropy as follows where k is real number. For k = 1, the entropy I d,1 is named as the first degree-based entropy in [9] and extremal graphs for different classes are characterized. For more works on such measures, see [13][14][15]. For k = 2, the quantity [1,10,16,17], which is well-known and mostly used in chemical graph theory. Thus, it is worthwhile to investigate the entropy measure I d,2 . We call it the second-degreebased entropy. Predictive quantitative structure-property relationship (QSPR) models play an essential role in the design of purpose-specific fine chemicals such as pharmaceuticals. It is usually very costly to test a compound using a wet lab, but the QSPR study allows that cost to be reduced. Topological indices plays an important role in establishing structure property relationship of molecule. For some recent works on this analysis, readers are referred to [18][19][20][21]. The ultimate goal of the present work is to investigate the role of second-degree-based graph entropy I d,2 in structure-property modeling of molecules.

Application Potential of Entropy
Topological indices abound and continue to grow in number. The majority of them are handled mathematically, lacking any sense of their chemical value. As a result, a collection of beneficial components was assembled to aid in picking of a pertinent molecular descriptor from a vast pool of candidates. Among the numerous qualities specified is the ability to anticipate the properties and activities of molecules. For the purpose of looking into the predicting ability of topological indices, quantitative structure-property relationship analysis is usually performed on theoretical attributes and experimental measures of some benchmark chemicals. The entropy-based indices are nurtured well in mathematical chemistry from a mathematical standpoint. Our aim is to illustrate the chemical connection of the entropy corresponding to the first Zagreb index. First, we consider the octane isomers as benchmark datasets. As octanes contain no cycles, we then take into account some hydrocarbons having cycles as a substructure.
The molecular graph representations of octane isomers are displayed in Figure 1. The numerical values of different properties and the I d,2 index are reported in Table 1.  The I d,2 index is found to have a significant correlation with entropy (S), enthalpy of vaporization (HVAP), standard enthalpy of vaporization (DHVAP), and acentric factor (AF). We investigate the following relation to examine the potential of I d,2 .
where P, I, C 1 , C 2 , and E i s represent property, index, slope, intercept, and errors, respectively. Performed regression analysis also contains standard error (SE), the F-test (F), and the significance F (SF), in addition to R, to judge more accurately. For S and AF, I d,2 yields the following structure-property relationships.
The linear fittings of relations (4) and (5) are shown in Figure 2.
The strength of structure property relationships (6) and (7), is displayed in Figure 3. The blue circles in Figures 2 and 3 are the points (x, y), where x and y represent the I d,2 and property for octanes, respectively, and the red line indicates the regression line.
From the R 2 values, one can say that the data variances for S, HVAP, DHVAP, and AF are 88%, 81%, 89%, and 94%, respectively. The blue circles for AF in Figure 3 are closure to the regression line compared to other frames. As the SE value decreases, the regression relation becomes strong. Each of the aforesaid equations yields small SE, AF especially is significantly low. The model's consistency boosts as the F-value rises. The F-value in model (7) is comparatively high. The model is regarded as statistically reliable when the SF value is less than 0.05. In each case, the SF value is significantly less than 0.05. Thus, one can conclude that the second-degree-based entropy exerts better performance in explaining acentric factor compared to S, HVAP, and DHVAP. Now, we will perform external validation for the constructed model in case of AF. The nonane isomer is considered here as an external data set. The set is divided into train and test sets in the ratio 80:20 by means of python scikit learn machine learning module. The train set is considered to generate the model, which is validated by the test set.  The relation (8) expresses the structure-property relationship in the train set, where the data variance is 86%. Plotting of predicted data against experimental data and random scattering in residual plot (see Figure 4) ensure that the model on training set is well aligned and consistent. The data variance on test set is 82%, which confirms that the external validation is meaningful.  Table 2. In case of S, the I d,2 performs better than ISI, SDD, SCI, and RR. The present invariant outperforms M 1 , F, M 2 , ISI, and RR for HVAP. The correlation of I d,2 with DHVAP is better than that of M 1 , F, M 2 , ISI, and RR. In case of AF, the current descriptor outperforms F, ISI, SDD, and SCI. Now, we consider some benzenoid hydrocarbons (BHCs) for investigation. The molecular structures of BHCs are shown in Figure 5. The second-degree-based entropy is observed to correlate well with the boiling point (BP) of benzenoid hydrocarbons. The BP and I d,2 values are reported in Table 3. The I d,2 index is also noticed to have significant correlation with the π-electron energy of benzenoid hydrocarbons. The regression relations for BP and E π is as follows: From relations (9) and (10), we can say that 94% and 98% of observations fit the models related to BP and E π , respectively. The corresponding linear fittings are shown in Figure 6. For comparative purposes, we correlate some well-known degree-based indices with boiling point and π-electron energy for benzenoid hydrocarbons. The correlation coefficients displayed in Table 4 yield that I d,2 outperforms some of those well-established indices.  Now, we consider some molecular graphs having cyclic substructure which are useful in drug preparation. These compounds include Aminopterin, Aspidostomide E, Carmustine, Caulibugulone E, Convolutamine F, Convolutamydine A, Tambjamine K, Deguelin, Perfragilin A, Melatonin, Minocycline, Podophyllotoxin, Pterocellin B, Daunorubicin, Convolutamide A, Raloxifene. The molecular graphs of these structures are displayed in Figure 7. The experimental and theoretical measures for these compounds are reported in Table 5.
Equations (11) and (12) reveal that the coefficient of determination for BP and MR are 73% and 90%, respectively. The linear fittings of the aforementioned structure-property relationship are shown in Figure 8. To compare the present descriptor with M 1 , F, M 2 , ISI, SDD, SCI, and RR, we correlate the degree-based indices with BP and MR for chemicals displayed in Figure 7. The correlation coefficients reported in Table 6 imply that I d,2 performs better than some of the well-known and most-used indices.  Table 7 for decane isomers. It shows that first Zagreb index and forgotten topological index are strongly correlated with I d,2 . Thus, there is a possibility of having a strong mathematical relation between them.

Concluding Remarks
The impact of the entropy on structure property modeling corresponding to the first Zagreb index has been investigated in this work. The I d,2 index has been found to have a significant predictive potential for physiochemical properties of octane isomers. The linear relation of I d,2 with entropy, enthalpy of vaporization, standard enthalpy of vaporization, and acentric factor has been found to be satisfactory. Especially, the performance of I d,2 in explaining AF is remarkable. An external validation using nonane isomers confirms this claim. The present entropy has been observed to model boiling point and π-electron energy of benzenoid hydrocarbons with powerful accuracy. The I d,2 is also capable of explaining boiling point and molar refraction of some compounds useful in drug generation. The second-degree-based entropy performs better than some well-known and commonly used degree-based indices for three data sets. This empirical study is expected to be performed on other data sets in the future, including aromatic and hetero-aromatic amines, polychlorobiphenyls, poly-arometic hydrocarbons, and so on. The strong correlation of I d,2 with M 1 and F indicates that there may be a strong mathematical connection between them, which could be considered as a future research direction.